Churros
library(readr)
library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(methods)
library(stringi)
library(keras)
## Warning: package 'keras' was built under R version 3.5.2
library(glmnet)
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-16
The dataset I decided to use for my image classification project is called “Food-101”. The entire dataset has 101 categories of food with a 1000 images in each category; however, I only use a subset of that with 10 categories to make the processing easier. Out of the 1000 images in each food category, 750 training images have deliberately not been cleaned as to make the classification algorithm more robust. The categories I use in my image classification task, with the number of images in each category are:
input_dir <- "~/Desktop/Statistical Learning/Project/food-101/myimages"
class_vector <- dirname(dir(input_dir, recursive = TRUE))
cbind(table(class_vector))
## [,1]
## apple_pie 1000
## cheesecake 1000
## churros 1000
## donuts 1000
## french_fries 1000
## grilled_cheese_sandwich 1000
## pancakes 1000
## pizza 1000
## samosa 1000
## tacos 1000
I start off by loading in my dataset which has already been processed and embedded in a seperate file. Along with that, I load in the metadata for my dataset.
#loading in the data set
img_data <- read.csv("~/Desktop/Statistical Learning/Project/my-image-data.csv")
X2 <- read_rds("~/Desktop/Statistical Learning/Project/my-image-embed.rds")
To make sure the images were loaded correctly, let’s look at a few sample images from the category “apple pie”.
#looking at the images
paths <- dir(input_dir, recursive = TRUE, full.names=TRUE)
par(mar = c(0,0,0,0))
par(mfrow = c(4, 6))
set.seed(1)
for (i in 1:24) {
plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n")
Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
rasterImage(Z/255,0,0,1,1)
}
Here’s another sample of images, this time from the category “samosa”.
par(mar = c(0,0,0,0))
par(mfrow = c(4, 6))
set.seed(1)
for (i in 8001:8024) {
plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n")
Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
rasterImage(Z/255,0,0,1,1)
}
As you can see, the images haven’t been cleaned and contain some noise (wrong labels, intense colors) which makes this classification task particularly challenging.
The model I used for transfer learning here is the ResNet 50 model which was trained as part of the ImageNet challenge. The corpus used for the challenge contained images of the size 224x224 pixels which is why I decided to go with the Food-101 dataset as opposed to my orignal choice CiFar10.
I decided to add two dense layers prior to he final layer and realized that if I start with a large number of units and then grdually have less and less units the neural network seemed to perform better. At every dense layer, I also decided to use batch normalization (which I came across as part of the Keras documentation).
The basic idea behind batch normalization is to normalise the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one. This regulaization also reduces overfitting and better generalizes the model (Accoding to the oiginal paper on Batch Normalization). During hyper parameter tuning I also saw that smaller dropout values worked better than larger ones in the intermediate layers so I went with a drropout rate of 0.5. I also kept the learning rate small (so that we do not have any exploding gradients).
Lastly, I decided to go with the “sigmoid” activation, since it seemed to perfom better than “relu” and “selu”. I present my final version of the model below.
#creating transfer learning model
X_train <- X2[img_data$train_id == "train",]
y_train <- to_categorical(img_data$class[img_data$train_id == "train"])
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, input_shape = ncol(X_train)) %>%
layer_batch_normalization() %>%
layer_activation(activation = 'sigmoid') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 128, input_shape = ncol(X_train)) %>%
layer_batch_normalization() %>%
layer_activation(activation = 'sigmoid') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = ncol(y_train)) %>%
layer_activation(activation = "softmax")
model %>% compile(loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(lr = 0.001 / 2),
metrics = c('accuracy'))
model
## Model
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## dense (Dense) (None, 256) 524544
## ___________________________________________________________________________
## batch_normalization (BatchNormal (None, 256) 1024
## ___________________________________________________________________________
## activation (Activation) (None, 256) 0
## ___________________________________________________________________________
## dropout (Dropout) (None, 256) 0
## ___________________________________________________________________________
## dense_1 (Dense) (None, 128) 32896
## ___________________________________________________________________________
## batch_normalization_1 (BatchNorm (None, 128) 512
## ___________________________________________________________________________
## activation_1 (Activation) (None, 128) 0
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 128) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 10) 1290
## ___________________________________________________________________________
## activation_2 (Activation) (None, 10) 0
## ===========================================================================
## Total params: 560,266
## Trainable params: 559,498
## Non-trainable params: 768
## ___________________________________________________________________________
After about 10 epochs the model seemed to be overfitting as the training accuracy kept going up but the validation accuracy plateued.
#fit data
history <- model %>%
fit(X_train, y_train, epochs = 10)
plot(history)
Following are the best results I got from the various versions of the models I ran:
y_pred <- predict_classes(model, X2)
tapply(img_data$class == y_pred, img_data$train_id, mean)
## train valid
## 0.94600 0.84375
Here is the confusion matrix that resulted from the predictions of my model.
y = img_data$class
table(y[img_data$train_id == "valid"], y_pred[img_data$train_id == "valid"])
##
## 0 1 2 3 4 5 6 7 8 9
## 0 306 11 6 4 1 33 19 9 10 1
## 1 31 319 3 12 2 17 7 4 4 1
## 2 8 10 356 7 6 3 4 1 4 1
## 3 23 8 12 329 4 7 7 2 3 5
## 4 2 0 2 0 378 10 0 0 2 6
## 5 10 4 6 2 34 322 5 2 9 6
## 6 29 25 2 4 1 20 309 5 4 1
## 7 6 1 5 1 2 4 2 370 1 8
## 8 22 4 3 4 4 12 2 1 338 10
## 9 3 2 1 2 3 23 2 6 10 348
The confusion matrix presents some very interesting findings. First, let’s look at the class names associated with each class so we can read the matrix.
categories <- c("apple_pie", "cheesecake", "churros", "donuts", "french_fries", "grilled_cheese_sandwich", "pancakes", "pizza", "samosa", "tacos")
class_names <- 0:9
df <- cbind(class_names, categories)
df
## class_names categories
## [1,] "0" "apple_pie"
## [2,] "1" "cheesecake"
## [3,] "2" "churros"
## [4,] "3" "donuts"
## [5,] "4" "french_fries"
## [6,] "5" "grilled_cheese_sandwich"
## [7,] "6" "pancakes"
## [8,] "7" "pizza"
## [9,] "8" "samosa"
## [10,] "9" "tacos"
As we can see, some of the cateegories that the classification model confused repeatedly were: apple pie and grilled cheese sandwich (most commonly confused), cheesecake and pancakes, apple pie and cheesecake etc. Now let’s pull up some images that the model misclassified.
Here are a few of the classifications that the model got wrong:
par(mfrow = c(2, 3))
id <- which(y_pred != y)
for (i in id[2:7]) {
par(mar = rep(0, 4L))
plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n")
Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
rasterImage(Z/255,0,0,1,1)
text(0.5, 0.1, label = categories[y_pred[i] + 1L], col = "red", cex=2)
}
And some classifications that the model classified with the highest probabilities:
y_probs <- predict(model, X2)
id <- apply(y_probs, 2, which.max)
par(mfrow = c(2, 4))
for (i in id[1:8]) try({
par(mar = rep(0, 4L))
plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n", asp=1)
Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
rasterImage(Z/256,0,0,1,1)
})
Lastly, we can visualize the kernels.
layer <- get_layer(model, index = 1)
par(mar = c(0,0,0,0))
par(mfrow = c(16,16))
for(i in 1:256){
wg <- layer$get_weights()[[1]][,i]
im <- abs(wg) / max(abs(wg))
plot(0,0,xlim=c(0,1), ylim=c(0,1), axes=FALSE, type='n')
rasterImage(im,0,0,1,1,interpolate = FALSE)
box()
}
I tried to visualize the embeddings using pca as well; however, the principle component analysis function prcomp() on my computer was very slow and resulted in the R session being aborted repeatedly.
#pca <- as_tibble(prcomp(X2)$x[,1:2])
#pca$y <- categories[y + 1L]
#ggplot(pca, aes(PC1, PC2)) +
#geom_point(aes(color = y), size = 4) +
#labs(x = "", y = "", color = "class") +
#theme_minimal()
The transfer learning classification model was able to correctly classify about 85% of the food images. Considering the quality of the images, this result seems fairly reasonable to me.